<!DOCTYPE html>
<html lang="en">
<head>
	<meta charset="UTF-8">
	<meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
	<title>配置语言分析器 | Elasticsearch: 权威指南 | Elastic</title>
    <!-- Give IE8 a fighting chance -->
    <!--[if lt IE 9]>
    <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
    <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
    <![endif]-->
	<link rel="stylesheet" type="text/css" href="../static/styles.css" />
</head>
<body>
<div class="main-container">
    <section id="content">
        
        <div class="content-wrapper">
            <section id="guide" lang="zh_cn">
                <div class="container">
                    <div class="row">
                        <div class="col-xs-12 col-sm-8 col-md-8 guide-section">
                            <div style="color:gray; word-break: break-all; font-size:12px;">原文地址: <a href="https://www.elastic.co/guide/cn/elasticsearch/guide/current/configuring-language-analyzers.html" rel="nofollow">https://www.elastic.co/guide/cn/elasticsearch/guide/current/configuring-language-analyzers.html</a>, 版权归 www.elastic.co 所有<br/>
                            英文版地址: <a href="https://www.elastic.co/guide/en/elasticsearch/guide/current/configuring-language-analyzers.html" rel="nofollow">https://www.elastic.co/guide/en/elasticsearch/guide/current/configuring-language-analyzers.html</a>
                            </div>
                        <!-- start body -->
                  <div class="page_header">
<b>请注意:</b><br>本书基于 Elasticsearch 2.x 版本，有些内容可能已经过时。
</div>
<div id="content">
<div class="breadcrumbs">
<span class="breadcrumb-link"><a href="index.html">Elasticsearch: 权威指南</a></span>
»
<span class="breadcrumb-link"><a href="languages.html">处理人类语言</a></span>
»
<span class="breadcrumb-link"><a href="language-intro.html">开始处理各种语言</a></span>
»
<span class="breadcrumb-node">配置语言分析器</span>
</div>
<div class="navheader">
<span class="prev">
<a href="using-language-analyzers.html">« 使用语言分析器</a>
</span>
<span class="next">
<a href="language-pitfalls.html">混合语言的陷阱 »</a>
</span>
</div>
<div class="section">
<div class="titlepage"><div><div>
<h2 class="title">
<a id="configuring-language-analyzers"></a>配置语言分析器<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elasticsearch-cn/elasticsearch-definitive-guide/edit/cn/200_Language_intro/20_Configuring.asciidoc">edit</a>
</h2>
</div></div></div>
<p>语言分析器都不需要任何配置，开箱即用，
它们中的大多数都允许你控制它们的各方面行为，具体来说：</p>
<div class="variablelist">
<a id="stem-exclusion"></a>
<dl class="variablelist">
<dt>
<span class="term">
词干提取排除
</span>
</dt>
<dd>
<p>想象下某个场景，用户们想要搜索 <code class="literal">World Health Organization</code> 的结果，
但是却被替换为搜索 <code class="literal">organ health</code> 的结果。有这个困惑是因为 <code class="literal">organ</code> 和 <code class="literal">organization</code> 有相同的词根： <code class="literal">organ</code> 。
通常这不是什么问题，但是在一些特殊的文档中就会导致有歧义的结果，所以我们希望防止单词 <code class="literal">organization</code> 和 <code class="literal">organizations</code> 被缩减为词干。</p>
</dd>
<dt>
<span class="term">
自定义停用词
</span>
</dt>
<dd>
<p>
英语中默认的停用词列表如下：
</p>
<pre class="literallayout">a, an, and, are, as, at, be, but, by, for, if, in, into, is, it,
no, not, of, on, or, such, that, the, their, then, there, these,
they, this, to, was, will, with</pre>

<p>关于单词 <code class="literal">no</code> 和 <code class="literal">not</code> 有点特别，这俩词会反转跟在它们后面的词汇的含义。或许我们应该认为这两个词很重要，不应该把他们看成停用词。</p>
</dd>
</dl>
</div>
<p>为了自定义 <code class="literal">english</code> （英语）分词器的行为，我们需要基于 <code class="literal">english</code> （英语）分析器创建一个自定义分析器，然后添加一些配置：</p>
<div class="pre_wrapper lang-js">
<pre class="programlisting prettyprint lang-js">PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_english": {
          "type": "english",
          "stem_exclusion": [ "organization", "organizations" ], <a id="CO128-1"></a><i class="conum" data-value="1"></i>
          "stopwords": [ <a id="CO128-2"></a><i class="conum" data-value="2"></i>
            "a", "an", "and", "are", "as", "at", "be", "but", "by", "for",
            "if", "in", "into", "is", "it", "of", "on", "or", "such", "that",
            "the", "their", "then", "there", "these", "they", "this", "to",
            "was", "will", "with"
          ]
        }
      }
    }
  }
}

GET /my_index/_analyze?analyzer=my_english <a id="CO128-3"></a><i class="conum" data-value="3"></i>
The World Health Organization does not sell organs.</pre>
</div>
<div class="calloutlist">
<table border="0" summary="Callout list">
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO128-1"><i class="conum" data-value="1"></i></a></p>
</td>
<td align="left" valign="top">
<p>防止 <code class="literal">organization</code> 和 <code class="literal">organizations</code> 被缩减为词干</p>
</td>
</tr>
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO128-2"><i class="conum" data-value="2"></i></a></p>
</td>
<td align="left" valign="top">
<p>指定一个自定义停用词列表</p>
</td>
</tr>
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO128-3"><i class="conum" data-value="3"></i></a></p>
</td>
<td align="left" valign="top">
<p>切词为 <code class="literal">world</code> 、 <code class="literal">health</code> 、 <code class="literal">organization</code> 、 <code class="literal">does</code> 、 <code class="literal">not</code> 、 <code class="literal">sell</code> 、 <code class="literal">organ</code></p>
</td>
</tr>
</table>
</div>
<p>我们在 <a class="xref" href="stemming.html" title="将单词还原为词根"><em>将单词还原为词根</em></a> 和 <a class="xref" href="stopwords.html" title="停用词: 性能与精度"><em>停用词: 性能与精度</em></a> 中分别详细讨论了词干提取和停用词。</p>
</div>
<div class="navfooter">
<span class="prev">
<a href="using-language-analyzers.html">« 使用语言分析器</a>
</span>
<span class="next">
<a href="language-pitfalls.html">混合语言的陷阱 »</a>
</span>
</div>
</div>

                  <!-- end body -->
                        </div>
                        <div class="col-xs-12 col-sm-4 col-md-4" id="right_col">
                        
                        </div>
                    </div>
                </div>
            </section>
        </div>
    </section>
</div>
<script src="../static/cn.js"></script>
</body>
</html>